545 research outputs found
Recommended from our members
Assessment of Stability in Partitional Clustering Using Resampling Techniques
The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data
Recommended from our members
Big data clustering: Data preprocessing, variable selection, and dimension reduction
[no abstract available
Recommended from our members
Classification and clustering: models, software and applications
We are pleased to present the report on the 30th Fall Meeting of the working group ``Data Analysis and Numerical Classification'' (AG-DANK) of the German Classification Society. The meeting took place at the Weierstrass Institute for Applied Analysis and Stochastics (WIAS), Berlin, from Friday Nov. 14 till Saturday Nov. 15, 2008. Already 12 years ago, WIAS had hosted a traditional Fall Meeting with special focus on classification and multivariate graphics (Mucha and Bock, 1996). This time, the special topics were stability of clustering and classification, mixture decomposition, visualization, and statistical software
Finding Roman brickyards in Germania Superior by model-based cluster analysis of archaeometric data
Chemical analysis of ancient ceramics and of other archaeologically important materials has been used frequently to support archaeological research. Often the dimensionality of the measurements has been high. Therefore, multivariate statistical techniques such as cluster analysis have to be applied. The aim of the present paper is to give a review of the research on bricks and tiles from Roman military brickyards in Germania Superior and to present the main results obtained by multivariate statistical analysis. In particular, new adaptive cluster analysis methods and modified model-based clustering are applied on archaeometric data (Mucha / Bartel / D olata 2002; 2003a; 2005b; in press; Bartel / D olata / M ucha 2000; 2003). The main result was the discovery of military brickyards that were not known when the project began about ten years ago. Recently, they have been discovered by the application of these multivariate statistical analysis models. Newly developed visualization methods support and facilitate the interpretation of both the data set and the results of grouping. This means archaeologists can easily identify a new finding of a Roman brick or tile by comparing its chemical fingerprint with those from the detected provenances
Big data clustering: Data preprocessing, variable selection, and dimension reduction
[no abstract available
Assessment of Stability in Partitional Clustering Using Resampling Techniques
The assessment of stability in cluster analysis is strongly related to the main difficult problem of determining the number of clusters present in the data. The latter is subject of many investigations and papers considering different resampling techniques as practical tools. In this paper, we consider non-parametric resampling from the empirical distribution of a given dataset in order to investigate the stability of results of partitional clustering. In detail, we investigate here only the very popular K-means method. The estimation of the sampling distribution of the adjusted Rand index (ARI) and the averaged Jaccard index seems to be the most general way to do this. In addition, we compare bootstrapping with different subsampling schemes (i.e., with different cardinality of the drawn samples) with respect to their performance in finding the true number of clusters for both synthetic and real data
Validation of K-means Clustering : Why is Bootstrapping Better Than Subsampling?
In simulation studies based on many synthetic and real datasets, we found out that subsampling has a weaker behavior in finding of the true number of clusters K than bootstrapping (Mucha and Bartel 2014, 2015, Mucha 2016). But why? Based on further investigations, here especially concerning the Kmeans clustering with the comparison of bootstrapping and a special version of subsampling named “Boot2Sub”, we try to answer this question. In subsampling, usually a parameter H, the cardinality of the drawn subsample, has to be pre-specified. Its specification means an additional serious problem. The way out would be to take the bootstrap sample but discard multiple points. We call such a special subsampling scheme “Boot2Sub”. Then, bootstrapping and subsampling “Boot2Sub” result exactly in the same subset of drawn observations. This way allows us to make fair direct comparisons of the performance of bootstrapping and subsampling. As a result of the assessment of applications to generated and real datasets, the conjecture arises that multiple points play an important role for the validation of the true number of clusters in K-means clustering
Multidifferential study of identified charged hadron distributions in -tagged jets in proton-proton collisions at 13 TeV
Jet fragmentation functions are measured for the first time in proton-proton
collisions for charged pions, kaons, and protons within jets recoiling against
a boson. The charged-hadron distributions are studied longitudinally and
transversely to the jet direction for jets with transverse momentum 20 GeV and in the pseudorapidity range . The
data sample was collected with the LHCb experiment at a center-of-mass energy
of 13 TeV, corresponding to an integrated luminosity of 1.64 fb. Triple
differential distributions as a function of the hadron longitudinal momentum
fraction, hadron transverse momentum, and jet transverse momentum are also
measured for the first time. This helps constrain transverse-momentum-dependent
fragmentation functions. Differences in the shapes and magnitudes of the
measured distributions for the different hadron species provide insights into
the hadronization process for jets predominantly initiated by light quarks.Comment: All figures and tables, along with machine-readable versions and any
supplementary material and additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-013.html (LHCb
public pages
Study of the decay
The decay is studied
in proton-proton collisions at a center-of-mass energy of TeV
using data corresponding to an integrated luminosity of 5
collected by the LHCb experiment. In the system, the
state observed at the BaBar and Belle experiments is
resolved into two narrower states, and ,
whose masses and widths are measured to be where the first uncertainties are statistical and the second
systematic. The results are consistent with a previous LHCb measurement using a
prompt sample. Evidence of a new
state is found with a local significance of , whose mass and width
are measured to be and , respectively. In addition, evidence of a new decay mode
is found with a significance of
. The relative branching fraction of with respect to the
decay is measured to be , where the first
uncertainty is statistical, the second systematic and the third originates from
the branching fractions of charm hadron decays.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-028.html (LHCb
public pages
Measurement of the ratios of branching fractions and
The ratios of branching fractions
and are measured, assuming isospin symmetry, using a
sample of proton-proton collision data corresponding to 3.0 fb of
integrated luminosity recorded by the LHCb experiment during 2011 and 2012. The
tau lepton is identified in the decay mode
. The measured values are
and
, where the first uncertainty is
statistical and the second is systematic. The correlation between these
measurements is . Results are consistent with the current average
of these quantities and are at a combined 1.9 standard deviations from the
predictions based on lepton flavor universality in the Standard Model.Comment: All figures and tables, along with any supplementary material and
additional information, are available at
https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2022-039.html (LHCb
public pages
- …